Chain motifs: The tails and handles of complex networks 
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Great part of the interest in complex networks has been motivated by the presence of structured, 
frequently non-uniform, connectivity. Because diverse connectivity patterns tend to result in distinct 
network dynamics, and also because they provide the means to identify and classify several types of 
complex networks, it becomes important to obtain meaningful measurements of the local network 
topology. In addition to traditional features such as the node degree, clustering coefficient and 
shortest path, motifs have been introduced in the literature in order to provide complementary 
description of the networks connectivity. The current work proposes a new type of motifs, namely 
chains of nodes, namely sequences of connected nodes with degree two. These chains have been 
subdivided into cords, tails, rings and handles, depending on the type of their extremities (e.g. open 
or connected). A theoretical analysis of the density of such motifs in random and scale free networks 
is described, and an algorithm for identifying those motifs in general networks is presented. The 
potential of considering chains for network characterization has been illustrated with respect to 
five categories of real-world networks including 16 cases. Several interesting findings were obtained, 
including the fact that several chains were observed in the real-world networks, especially the WWW, 
books, and power-grid. The possibility of chains resulting from incompletely sampled networks is 
also investigated. 

PACS numbers: 89.75.Fb, 02.10.Ox, 89. 75. Da, 87.80. Tq 
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I. INTRODUCTION 

A large number of interesting dynamic systems can 
be studied and modeled by first representing them as 
networks and then considering specific dynamic mod- 
els. Because the latter depend greatly on the connec- 
tivity of the network, it becomes critical to obtain good 
characterizations of the respective connectivity structure. 
Such a characterization is even more important in cases 
when the dynamics is not considered, e.g. while analyz- 
ing a frozen instance of systems such as the Internet 
and protein-protein interaction networks. Therefore, it is 
hardly surprising that a great deal of efforts (e.g. [l|) has 
been invested in developing new measurements capable 
of providing meaningful and comprehensive characteriza- 
tion of the connectivity structure of complex networks. 

Traditional measurements of the topology of complex 
networks include the classical vertex degree and the clus- 
tering coefficient (e.g. [1]). Both these features are de- 
fined for each vertex in the network and express the con- 
nectivity only at the immediate neighborhood of that ref- 
erence vertex. Other measurements such as the minimum 
shortest path and betweenness centrality reflect the con- 
nectivity of broader portions of the network. Hierarchical 
measurements (e.g. [3, 0, S 0) such as the hierarchical 
vertex degree and hierarchical clustering coefficient, also 
applicable to individual reference vertices, have been pro- 
posed in order to reflect the connectivity properties along 
successive hierarchical neighborhoods around the refer- 
ence vertex. Another interesting family of measurements 
of the topological properties of complex networks involves 
the quantiflcation of the frequency of basic motifs in the 
network (e.g. 0, i i, IH). Motifs are subgraphs cor- 
responding to the simplest structural elements found in 



networks, in the sense of involving small number of ver- 
tices and edges. Examples of motifs include feed-forward 
loops, cycles of order three and bi-fans. 

The study of chains of nodes in networks has been 
preliminarily considered. Costa [ll| studied the effect of 
chains in affecting the fractal dimension as revealed by 
dilations along networks. Kaiser and Hilgetag [l^ stud- 
ied the vulnerability of networks involving linear chains 
with an open extremity. In another work p^ . they ad- 
dressed the presence of this same type of motifs in a 
sparse model of spatial network. More recently, Levnajic 
and Tadic jl4|] investigated the dynamics in simple net- 
works including linear chains of nodes. 

Although several measurements are now available in 
the literature, their application will always be strongly 
related to each specific problem. In other words, there 
is no definitive or complete set of measurements for the 
characterization of the topology of complex networks. 
For instance, in case one is interested in the commu- 
nity structures, measurements such as the modularity 
are more likely to provide valuable and meaningful in- 
formation [Tsj . In this sense, specific new problems will 
likely continue to motivate novel, especially suited^mea- 
surements. The reader is referred to the survey [l| for 
a more extensive discussion of measurements choice and 
applications. 

The current work proposes a new, complementary way 
to characterize the connectivity of complex networks in 
terms of a special class of motifs defined by chains of ver- 
tices, which are motifs composed by vertices connected in 
a sequential way, where the internal vertices have degree 
two. These motifs include cords, tails, rings and handles. 
While tails and handles have at least one extremity con- 
nected to the remainder of the network, cords and rings 
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FIG. 1: The chains can be classified into difi'erent types, de- 
pending on the connections among their external vertices. 
Here is shown six types of chains (dark gray vertices): (a) 
a cord, (b) a tail, (c) a two-tail, (d) a ring, (e) a handle and 
(f) a n— handle. 



are disconnected, being composed by groups of vertices 
connected in a sequential way. Additional motifs such 
as two or more handles connected to the remainder of 
the network, namely n-handles with n > 2, can also be 
defined, but they are not also considered in this work. 

Figure [1] illustrates six types of chains, namely (a) a 
cord, (b) a tail, (c) a two-tail, (d) a ring, (e) a handle 
and (f) a n— handle. The main difference between the 
traditional motifs and those defined and characterized in 
this article is that the latter may involve large number of 
vertices and edges. 

The main motivation behind the introduction of the 
concept of chains in complex networks provided in this 
article is that such a structure is odd in the sense that 
it can be conceptualized as an edge containing a series 
of intermediate vertices which make no branches. In sev- 
eral aspects, such as in flow, the incorporation of such 
intermediate vertices along an edge will imply virtually 
no change on the overall dynamics of that substructure of 
the network. In other words, the same flow capacity will 
be offered by either the isolated edge or its version incor- 
porating a series of intermediate vertices. Interestingly, 
vertices with only two neighbors — henceforth called ar- 
ticulations — seem to have a rather distinct nature and 
role in complex networks, which suggests that they may 
have distinct origins. For instance, as explored further in 
this work, articulations seem to appear in networks gen- 
erated by sequential processes (e.g. word adjacency in 
books), but can also be a consequence of incompleteness 
of the building process of networks. The latter possibility 
is experimentally investigated in this work by considering 
incompletely sampled versions of network models. 

In addition to introducing the concept and a theory 
of chains and articulations in complex networks and pre- 



senting means for their identification, the present work 
also illustrates the potential of the considering the statis- 
tics of cords, tails, and handles for characterizing real- 
world networks (social, information, technological, word 
adjacency in books, and biological networks). This arti- 
cle starts by presenting the definition of chains and their 
categories (i.e. cords, tails, and handles), and proceeds 
by developing an analytical investigation of the density 
of chains in random and scale free models. Next, an al- 
gorithm for the identification of such motifs is described, 
following by a discussion of the obtained chain statis- 
tics. The application of such a methodology considers 
the characterization of real-world complex networks in 
terms of chain motifs. 



II. CHAINS, CORDS, TAILS, HANDLES, AND 
RINGS 

Given a network with N vertices, consider a sequence 
(ni,n2, . . . ,nrn+i) of m -|- 1 vertices rii. If the sequence 
has the following properties: 

1. There is an edge between vertices rii and 1 < 
i < m; 

2. Vertices rii and Um+i have degree not equal to 2; 
and 

3. Intermediate vertices n^, 2 < i < m, if any, have 
degree 2; 

we call the sequence a chain of length m. Vertices ni 
and Um+i are called the extremities of the chain. 

Chains can be classified in four categories {kn- is the 
degree of vertex n^): 

Cords are chains with fc„^ = 1 and fcn„+i = 1. 

Handles are chains with > 2 and fc„^_|_j > 2. 

Tails are chains with fc„j = 1 and kn^_^-^ > 2 (or equiv- 
alently fc„j > 2 and kn^^^ — 1). 

Rings (of length m) are sequences (ni, n2, . . . , n„i) of ni 
vertices where the degree of each vertex is fc„; — 
2, 1 < n < m, rii is adjacent to Uij^i (for 1 < i < 
m — 1), and n„i is adjacent to ni. 

Rings are a special case of chains in which there is no 
extremities, and was included in the chain classification 
only for completeness. 

Including the trivial cases with m = 1, it is easy to 
see that each vertex of degree 1 is at an extremity of a 
cord or a tail and each vertex of degree greater than 2 
is at an extremity of a tail or a handle. Note that the 
definition of handles includes the degenerate case where 
the extremities are the same vertex: ni = Um+i- 

With these definitions and writing iVc, Nh, Nt, and 
Nji for the total number of cords, handles, tails, and 
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FIG. 2: The chain can be (a) undirected, (b) directed and (c) 
mixed. Mixed chains have arcs in any direction. Note that 
(c) and (d) are equivalent. 



rings, respectively, N{k) for the number of vertices of 
degree k we have: 

N{1) = 2Nc + Nt, (1) 

^kN{k) = 2Nh + Nt. (2) 

fc>2 

To evaluate the number of vertices of degree 2, we in- 
troduce the notation Nc{rn) for the number of cords of 
length TO, and similarly Nnim) for handles, Nxirn) for 
tails, and Nuijn) for rings. Each chain of length to has 
TO — 1 and each ring of length to has to vertices of degree 
2, giving: 

oo 

iV(2) = J2 lmNR(m) + (m - 1) (iVc(m) + NH{m) + Nrim))] 

m— 1 

(3) 

Isolated vertices (vertices with degree 0) have no effect 
on such structures, and it is considered hereafter that the 
network has no isolated nodes. 

The chains can also be classified according to the na- 
ture of its connections as in Figure [H In undirected net- 
works, the chains are said undirected (Figure [2]). In di- 
rected networks, on the other hand, the chains can be 
classified into three types: 

1. Directed chains are those whose arcs of inner ver- 
tices follow just one direction, i.e. there is a directed 
path from one extremity to the other (Figure [2I^b)). 

2. Undirected chains are defined as for undirected net- 
works, which have undirected arcs between inner 
vertices (Figure [^a)). An undirected arc between 
vertices i and j exist if there are an arc from i to j 
and another from j to i. 

3. Mixed chains are those with any other combination 
of arc directions like in Figure HKc). 

In our analysis we consider just undirect networks, but 
the extension for direct networks is straightforward. 



III. ALGORITHM FOR CHAIN 
IDENTIFICATION 

The algorithm to identify chains of vertices includes 
two steps, one for finding chains of size greater than 1 
and the other for finding chains of unit size. The first 
step is illustrated in Figure [3] and described as following: 

• input: graph G 

• output: list containing all chains of size greater than 2 

• calcule the degree of vertices in G and store them in a 
list K 

• Find vertices i such that ki > 2, ki £ K, and store 
them in a list Q2 

• while Q2 is not empty do 

— remove a vertex (A) from Q2 and then insert its 
first neighboring vertex (B), A, and its second 
neighboring vertex (C) in a queue P (in this order) 

— while the first and last elements of P have degree 
equal to 2 or are not the same do 

* let D be the neighboring node of the first el- 
ement in P. In case D is not already in P, 
include it into that queue in the first posi- 
tion. 

* if D is in Q2, remove it. 

* let E be the neighboring node of the last el- 
ement in P. In case E is not already in P, 
include it into that queue in the last position. 

* if E is in Q2, remove it. 

— insert P in a list L and clear P 

The list L contains all chains of size greater than 2. 
They can now be classified into cords, tails, and handles 
according to the degree of the first and last element of 
the corresponding queue. 

The second step, required for identifying the chains of 
unit length, is as follows: 

• input: graph G, list K and list L 

• output: list of cords, tails, and handles of unit size 

• find all vertices of degree equal to 1 which were not in 
L and store them in a list Ql 

• while Ql is not empty do 

— remove a vertex from Ql and insert it in a queue 
P 

— if the neighboring node of A has degree also equal 
to 1, remove it from Ql, insert it in P, and insert 
P in a list CI 

— else insert its neighbor in P and insert P in a list 
Tl 

• include all pairs of connected vertices which are not in 
L, CI or Tl to a list HI 

The lists CI, Tl, and HI contain, respectively, all 
cords, tails, and handles of unit size in the network. 







FIG. 3: The main steps to identify handles of size greater than 2 in networks includes: (i) choose a vertex of degree 2 and add 
it to a list (dark gray vertex); (ii) go to its neighbors and also add them if they have degree 2; (iii) go to the next neighbors, 
excluding the vertices already added in the list, and also add them if they have degree 2; (iv) stop adding vertices to the list 
after finding two vertices of degree greater than 2. In this case, the size of the obtained handle is 6. The same procedure can 
also be applied to find cords and tails, but at least one extremity should have degree equal to 1. 



IV. STATISTICS 



B. Cords 



Consider an ensemble of networks completely deter- 
mined by the degree-degree correlations P{k, k') [ssj 
Given P(fc, k') and the number of vertices in the network, 
we want to evaluate the number of each chain type and 
rings. The degree distribution P{k) and the conditional 
neighbor degree distribution P(fc'|fc), i.e. the probability 
that a neighbor of a vertex with degree k has degree fc', 
are easily computed: 



P{k) = 
P{k'\k) = 



Ek' P{k,k')/k 
Y.k',k"Pik'.k")/k'' 
{k)P(k,k') 

kP{k) ' 



(4) 
(5) 



where (fc) — J2k kP{k) is the average degree of the net- 
work. 



A. Rings 



Starting from a vertex of degree 1, a cord is traversed 
by following through a set of vertices of degree 2 until 
reaching a vertex of degree 1 that ends the cord. A cord 
of length 1 has no intermediate vertices; starting in a ver- 
tex of degree 1 , the probability of finding a cord of length 
1 is therefore given by P(l|l). For a cord of length 2, the 
edge from the initial vertex should go through a vertex of 
degree 2 before arriving at a new vertex of degree 1, giv- 
ing P(2|1)P(1|2). For lengths greater than 2, each new 
intermediate vertex is reached with probability P(2|2), 
and therefore we have P(2|l)P(2|2)'"-2p(l|2)[3i| for a 
cord of length m. Considering that there are iVP(l) ver- 
tices of degree 1 in the network, but only half of them 
must be taken as starting vertex to find a cord, we arrive 
at: 



Nc{m) = 



iiVP(l)P(l|l) 
|iVP(l)P(2|l)P(2|2) 



if TO = 1, 

-2P(1|2) if TO > 1. 

(7) 



For a ring of length to, we start at a vertex of degree 
2, go through to — 1 vertices of degree 2 and reach back 
the original vertex. Each transition from a vertex of de- 
gree 2 to the other, with the exception of the last one 
that closes the ring, has probability P(2|2); the closing 
of the ring requires reaching one of the vertices of degree 
2 (probability P(2|2)) and among them, exactly the start 
one (probability 1/(A^P(2)). If we start from all vertices 
of degree 2, each ring will be counted m times, resulting 
in: 



NR{m) = lp(2|2)' 

TO 



(6) 



This expression is valid only for the case of small m and 
large iV, such that the vertices already included in the 
ring do not affect significantly the conditional probabil- 
ities. Such an approximation is used throughout this 
work. Note that, under this circumstance, when comput- 
ing Eq. ([31), Nfi{'m) is of the order of the approximation 
error in the expressions of Ncim), NT{m), and Nnifn). 



C. Tails 

The number of tails can be computed similarly. We 
need either to start at a vertex with degree 1 and reach 
a vertex of degree greater than 2 or vice versa; only one 
of these possibilities must be considered. We arrive at: 

/V Tto^ - / ifm = l, 

JVTimj - I ^p(i)p(2|l)P(2|2)™-2p(> 2|2) if m > 1, 

(8) 

where the notation P(> 2|fc) = X)fe'>2 P{k'\k) is used. 



D. Handles 

A handle starts in a vertex of degree fc > 2 and ends in 
a vertex of degree k' > 2. Starting from one of the NP{k) 
vertices of degree fc > 2 of the network, there are fc possi- 
bilities to follow a chain, each characterized by a sequence 
of vertices of degree 2 until reaching a vertex of degree 
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k' > 2. This gives a total of NkP{k)P{> 2]k) handles of 
length 1 and 7VfcP(fc)P(2|/c)P(2|2)™-2p(> 2|2) handles 
of length TO > 1. Summing up for all values of fc > 2, us- 
ing J2k^^i^)^i^'\^) ^ k'P{k'), which can be deduced 



from relations (|4]) and ([5]) , and considering that each han- 
dles is counted twice when starting from all nodes of de- 
gree greater than 2, we have: 



Nnim) 



liV{(fc) - P(l)[2 - P(l|l) - P(2|l)] - P(2)[4 - P(l|2) - P(2|2)]} if to = 1, 
i7V[2P(2) - P(1)P(2|1) - 2P(2)P(2|2)]P(2|2)™-2p(-> 2|2) if to > 1. 



(9) 



Using Equations 0, d!]), and ^ we have 



oo 

E 

m— 1 



[(to - 1) {Nc{m) + Nnim) + Nrim))] = N{2). 



Comparing this result with Equation ([3]) we see that the 
rings are already counted in the number of chains, as 
hinted in the end of Section [IV Al This happens because, 
while computing the probability of chains, we ignore the 
fact that the presence of rings decreases the number of 
possible chains. For a large enough network, the number 
of rings should be small compared with the number of 
the other structures, validating the approximation. 

Note that all expressions are proportional to P(2|2)'", 
and therefore large chains should be exponentially rare, 
if they are not favored by the network growth. 



V. THEORETICAL ANALYSIS FOR 
UNCORRELATED NETWORKS 

For uncorrelated networks, where the degree at one 
side of an edge is independent of the degree at the other 
side of the edge, P(fc, k') can be factored as 



P{k,k') 



kP(k)k'P(k') 

w ■ 



The conditional probability is simplified to 

fc'P(fc') 



P{k'\k) 



{k) 



(10) 



(11) 



Using this last expression, we have for uncorrelated net- 
works 



Nn{m) 
Ncim) 
NT{m) 
NH{m) 



1 



2P(2) 



(fc) 



2™-2iVP(l)2p(2)'^ 



A^P(l) 
N{k) 



'2P{2) 



(k) 
2P(2) 



a 



(k) 



m — 1 



(12) 
(13) 
(14) 
(15) 



where a 



1 - 



P(l) 2P(2) 



1. Erdos-Renyi networks 

Erdos-Renyi networks have no degree correlations and 
a Poissonian degree distribution: 



P(fc) 



kl 



(16) 



This gives the following expressions for the number of 
rings, cords, tails and handles: 



{kY 



N 



m („i+l)(fc) 



{k)e 



(17) 

(18) 
(19) 
(20) 



fc) — l). Figure 0] shows the compar- 



NR{m 

Nc{m 
Nrim 
Nnim 

where e — (e^*"'^ — 
ison of the results for networks with N — 10^ vertices 
and L — 972 941 edges (this number of edges was chosen 
to give the same average degree as for the scale-free net- 
work discussed below) . A total of 1 000 realizations of the 
model were used to compute the averages and standard 
deviations. 



2. Scale-free networks 

We now proceed to uncorrelated scale-free networks 
with degree distribution given as 



m = 



(21) 



where 7 is the power law coefficient and C(x) is the Rie- 
mann zeta function. This distribution describes a strictly 
scale-free network, with the power law valid for all val- 
ues of k and a minimum fcmin — 1- The results are there- 
fore not directly applicable to scale-free real networks or 
models. The average degree is (k) = ^(7 — 1)/C,{'j). The 
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FIG. 4: Number of cords (a), tails (b), and handles (c) of 
different sizes in the model with Poisson degree distribution. 
The points are the averaged measured values (each of the 
error bars corresponds to one standard deviation), the lines 
are the values computed analytically. Note that the abrupt 
increase of the width of the error bars is a consequence of the 
logarithmic scale. 



resulting expressions are: 
NRim) 

Nrim) 



2-m(7-l) 

mC(7 - 1)" 
^C(7)C(7-1)™ 

2-(rn-l)(7-l) 

N-^^, —13 



^C(7)C(7-1)'" 



(22) 
(23) 
(24) 
(25) 



where f3 = [C(7 - 1) - 1 - 2'^'^~'^^y . 

Figure [5] shows the comparison of the results for net- 
works with N — 10^ vertices and 7 — 2.5. A total of 
1 000 realizations of the model were used to compute the 
averages and standard deviations. A comparison with 
Figure [4] shows that the Poisson degree distribution with 
the same average degree presents larger chains. This is 
due to the relation between the constants in the exponen- 
tial dependency with m: (fc)/e^*''^ ^ 0.278 for the Poisson 
model and 2^~'''/C(7— 1) « 0.135 for the scale-free model. 

The results presented in this section addressed the is- 
sue of validating the theory for analytical models. In 
Section |Vl we will evaluate the theory while considering 
real- world networks. 



VI. REAL- WORLD NETWORKS 

It is known that networks belonging to the same class 
may share similar structural properties [1, [l^. So, to 
study the presence of handles in networks, we considered 
five types of complex networks, namely social networks, 
information networks, word adjacency networks in books, 
technological networks, and biological networks. 



A. Social networks 

Social networks are formed by people or group of peo- 
ple (firms, teams, economical classes) connected by some 
type of interaction, as friendship, business relationship 
between companies, collaboration in science and partic- 
ipation in movies or sport teams 2], to cite just a few 
examples. Below we describe the social networks consid- 
ered in our analysis. 

Scientific collaboration networks are formed by sci- 
entists who are connected if they had authored a pa- 
per together. In our investigations, we considered the 
astrophysics collaboration network, the condensed mat- 
ter collaboration network, the high-energy theory col- 
laboration network, all collected by Mark Newman from 
http://www.arxiv.org, and the scientific collaboration 
of complex networks researchers, also compiled by Mark 
Newman from the bibliographies of two review articles 
on networks (by Newman '"^ and Boccaletti et al. 17]). 
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FIG. 5: Number of cords (a), tails (b), and handles (c) of 
different sizes in the model with scale-free degree distribution. 
The points are the averaged measured values (each of the error 
bars corresponds to one standard deviation), the lines are the 
values computed analytically. 



The astrophysics collaboration network is formed by sci- 
entists who post preprints on the astrophysics archive, 
between the years 1995 and 1999 [3. The condensed 
matter collaboration network, on the other hand, is com- 
posed by scientist posting preprints on the condensed 
matter archive from 1995 mitil 2005 Finally, the 

high-energy theory collaboration network is composed by 
scientists who posted preprints on the high-energy theory 



archive from 1995 until 1999 

B. Information networks 

Roget's Thesaurus network is constructed associat- 
ing each vertex of the network to the one of the 1022 cat- 
egories in the 1879 edition of Peter Mark Roget's The- 
saurus of English Words and Phrases, edited by John 
Lewis Roget [2i|. Two categories i and j are linked if 
Roget gave a reference to j among the words and phrases 
of i, or if such two categories are directly related to each 
other by their positions in Roget's book [2l[. Such net- 
work is available at Pajek datasets [2^ . 

Wordnet is a semantic network which is often used as a 
form of knowledge representation. It is a directed graph 
consisting of concepts connected by semantic relations. 
We collected the network from the Pajek datasets [2^ . 

The World Wide Web is a network of Web pages be- 
longing to nd.edu domain linked together by hyperlinks 
from one page to another [2^. The data considered in 
our paper is available at the Center for Complex Network 
Research f24ll 

C. Word adjacency in books 

Word adjacency in books can be represented as a net- 
work of words connected by proximity [1^. A directed 
edge is established between two words that are adjacent 
and its weight is the number of times the adjacent words 
appear in the text. Before constructing a network, the 
text must be preprocessed. All stop words (e.g. arti- 
cles, prepositions, conjunctions, etc) are removed, and 
the remaining words are lemmatized [25|. In our analysis, 
we considered the books: David Copperfield by Charles 
Dickens, Night and Day by Virginia Woolf, and On the 
Origin of Species by Charles Darwin compiled by An- 
tiqueira et al. [26j . 

D. Technological networks 

Internet or the autonomous systems (AS) network is a 
collection of IP networks and routers under the control of 
one entity that presents a common routing policy to the 
Internet. Each AS is a large domain of IP addresses that 
usually belongs to one organization such as a university, 
a business enterpriser, or an Internet Service Provider. 
In this type of networks, two vertices are connected ac- 
cording to BGP tables. The considered network in our 
analysis was collected by Newman in July, 2006 [27j . 

The US Airlines Transportation Network is formed 
by US airports in 1997 connected by flights. Such net- 
work is available at Pajek datasets |23 |. 

The Western States Power Grid represents the 
topology of the electrical distribution grid 28]. Vertices 
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represent generators, transformers and substations, and 
edges the high-voltage transmission Hnes that connect 
them. 



E. Biological networks 

Some biological systems can be modeled in terms of 
networks as the brain, the genetic interaction and the 
interaction between proteins. 

The neural network of Caenorhabditis ele- 
gans is composed by neurons connected according to 
synapses 28, 29]. 

Transcriptional Regulation Network of the Es- 
cherichia coli is formed by operons (an operon is a 
group of contiguous genes that are transcribed into a 
single mRNA molecule). Each edge is directed from an 
operon that encodes a transcription factor to another 
operon which is regulated by that transcription factor. 
This kind of network plays an important role in control- 
ling gene expression [7]. 

The protein-protein interaction network of Sac- 
charomyces cerevisiae is formed by proteins con- 
nected according to identified directed physical interac- 
tions [3Q] . 

VII. RESULTS AND DISCUSSION 

We analyzed the real-world networks by comparing 
their number of cords, tails, and handles with random 
networks generated by the rewiring procedure as de- 
scribed in (bH and with the theory proposed in Sec- 
tion Hvl 

A. Comparison between real-world networks and 
their randomized counterparts 

For each considered real-world network, we gener- 
ated 1 000 randomized versions (100 for WWW) by the 
rewiring process described in [Slj. The generated net- 
works have the same degree distribution as the original, 
but without any degree-degree correlation. In order to 
compare the chain statistics obtained for the real-world 
and the respective randomized versions, we evaluated the 
Z-score values for each size of the cords, tails, and han- 
dles. The Z-score is given by, 

^^ XReal-(X) ^ (26) 

a 

where XReai is the number of cords, tails, or handles 
with a specific size of the original (real-world) analyzed 
network, and {X) and a are, respectively, the average and 
the standard deviation of the corresponding values of its 



randomized counterparts. A null value of the Z-score 
indicates that there is no statistical difference between 
the number of occurrences of cords, tails, or handles in 
the considered network and in its randomized versions. 

The results of the Z-scores for all considered networks 
can be seen in Figure El The cases in which the Z-score 
values are not defined (cr = 0) were disconsidered. 

The majority of results presented in Figure [6] can be 
explained by the fact that the rewiring process tends to 
make uniform the distribution of cords size, tails and 
handles. In this way, the excess of these structures on 
the real networks will reduce in the random counterparts. 
For instance, if a network have many large handles, its 
random version will present few large handles but many 
small ones. The next discussion will not take into account 
the shape of the distribution of chains, but just the most 
important results. 

In the case of collaboration networks, there is a large 
quantity of cords. This fact suggests that researchers 
published papers with just one, two or three other scien- 
tists. Cords may appear because many researchers can 
publish in other areas and, therefore, such papers are not 
included in the network. If other research areas had been 
considered, this effect could not occur and the number of 
small cords would be less significant. Thus, the pres- 
ence of cords in collaboration networks can be the result 
of database incompleteness. Another possible cause of 
cords in such networks concerns the situations of authors 
which publish only among themselves. 

The information networks do not present a well defined 
patterns as observed in collaboration network. The Ro- 
get thesaurus network is different from the others, but 
the results obtained for such a network are not expres- 
sive enough to be discussed. Important to note that in 
the Wordnet and WWW, there is a large occurrence of 
tails of size one. In the case of Wordnet, this happen 
because specific words has connections with more com- 
mon words which has connections with the remainder of 
the network. In the case of WWW, this structure is a 
consequence of characteristic url documents which have 
just one link. In addition to small tails, the WWW have 
long tails and handles. This fact can be associated to the 
way in which the network were constructed, by consider- 
ing a web crawler [2^ — a program designed to visit url 
documents inside a given domain and get links between 
them in a recursive fashion. When pages are visited by 
the crawler, the wandered path can originate chains. If 
the program is not executed by a long time interval, long 
chains can appear. Thus, this effect can be resulting of 
incomplete sampling (see Subsection IVII Cp . Besides, as 
the process of network construction is recursive, isolated 
components does not occurs in the database and there- 
fore there are no cords and rings. 

The books adjacency networks presents a characteristic 
pattern of chains: no cords, the same quantity of tails of 
sizes 1, 2 and 3 as observed in the random counterparts, 
and many handles of size 1, 3, 4 and 5. The increasing 
in the quantity of handles of size 2 in random versions 
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FIG. 6: Z-scores of the number of cords, tails, and handles for each size. The number of generated random networks was 1 000 
for all considered networks, except for WWW, which was 100 (because of the substantially larger size of this network). 



are consequence of the fact that when the rewiring pro- 
cess are performed, many handles of size one can be put 
together. This fact explain why book networks present 
more handles of size one than in random counterparts. 
On the other hand, the long handles are consequence of 
the sequential process considered to obtain the network. 

In technological networks, the chain patterns are more 
significant in power grid. This networks present a high 
quantity of tails of size one and handles of size 11. While 
the first occurrence appear to be related to the geograph- 
ical effect, where new vertices needed to cover a new 
region tend to connect with the near vertices, the sec- 
ond can be resulting of geographical constraints (e.g. the 
transmissors may be allocated in a strategic way in or- 
der to contour a mountain, lake or other geographical 
accidents) . 

The results obtained for biological networks are not so 
expressive. However, the protein interaction network of 
the yeast S. cerevisiae have many cords of size one and 
two. The presence of small cords in this networks is a 
consequence of isolated chains of proteins which interact 



only with a small number of other proteins. This fact 
can be due to incompleteness [s^l , where many real con- 
nections may not be considered, or high specialized pro- 
teins, which lost many connections because the mutation 
process — protein interaction networks evolve from two 
basic process: duplication and mutation [33| . 



B. Theoretical analysis of the real- world networks 

Going back to the analysis presented in Section IIVI 
we applied those theoretical developments to the con- 
sidered real-world networks. We obtained their degree- 
degree correlations and computed the expected number 
of cords, tails, and handles in function of their sizes by 
Equations (O, ([5]), and respectively. The number of 
rings was not taken into account because of their very 
low probability to appear in real- world networks. The 
results concerning the theoretical analysis are shown in 
Figure \7\ The cases not shown are those that have all 
chains smaller than 2. Due to the low probability of 
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finding cords in networks, only three networks are shown 
(Figure [7{a)), namely: cond-mat, high-energy collabora- 
tions and the Wordnet. The theoretical prediction does 
not work well for these networks, except for the Word- 
net, predicting less cords than those found in the real 
networks. An opposite situation was found for the num- 
ber of tails and handles, shown in Figure [7] (b) and (c) 
respectively. However, there are more larger tails and 
handles in the real- world networks than predicted by the- 
ory, except for Astrophysics, cond-mat, and high-energy 
collaboration networks. 

Despite the fact that, for some cases, the number of 
small cords, tails, and handles of the real-world networks 
were far from the values obtained from their respective 
randomized counterparts (see Figure [6]), the theoretical 
results were accurate for several cases, except for astro- 
physics (handles), netscience (tails), cond-mat (cords and 
handles), high-energy (cords, tails, and handles), WWW 
(tails and handles), the book On the origin of species 
(handles), and power grid (handles) (see (Figure [7]). 

C. Analysis of incomplete networks 

In order to investigate the possibility that incomplete 
networks presents many tails and handles, we sampled 
two theoretical network models, namely Erdos-Renyi 
model (ER) 13411 and Barabasi and Albert scale-free 
model (B A) [35j by performing random walks [s^ [13] , 
and analyzing the corresponding distributions of tails and 
handles. The ER and BA models included 100000 ver- 
tices with average degree 6. The results of the random 
walks in these theoretical networks are shown in Figure [S] 
Each point of the mesh grid is the average value consid- 
ering 1000 realizations. 

For the ER and BA models the results are very sim- 
ilar, with the difference that the tails tend to vanish 
with larger random walks (almost 10^ steps) in the BA 
model. This is not the case for the ER network because 
its original structure already had vertices with unit de- 
gree. Therefore, this network already had small tails (size 
1 and 2). Conversely, BA networks of average vertex de- 
gree 6 do not have tails, and with large random walks 
these structures tend to vanish. 

The results from Figure [5] clearly indicates that there 
are many large tails and handles for both models when 
the random walks are relatively short. As the size of 
random walks are increased, the number of large tails 
and handles tend to decrease, but the number of small 
tails and handles increases, because with large random 
walks the probability of breaking large tails and handles 
in smaller parts is increased. As the length of the random 
walks increase further, the large tails and handles tend 
to vanish, and the original networks are recovered. 
VIII. CONCLUSIONS 

One of the most important aspects characterizing dif- 
ferent types of complex networks concerns the distribu- 



tion of specific connecting patterns, such as the tradi- 
tionally investigated motifs. In the present work we con- 
sidered specific connecting patterns including chains of 
articulations, i.e. linear sequences of interconnected ver- 
tices with only two neighbors. Such a new type of motifs 
has been subdivided into cords (i.e. chains with free ex- 
tremities), rings (i.e. chains with no free extremities but 
disconnected from the remainder of the network), tails 
(i.e. chains with only one free extremity) and handles 
(i.e. chains with no free extremity). By considering a 
large number of representative theoretical and real- world 
networks, we identified that many specific types of such 
networks tend to exhibit specific distribution of cords, 
tails, and handles. We provide an algorithm to identify 
such motifs in generic networks. Also, we developed an 
analytical framework to predict the number of chains in 
random network models, scale-free network models and 
real- world networks, which provided accurate approxi- 
mations for several of the considered networks. Finally, 
we investigated the presence of chains by considering Z- 
score values (i.e. comparing the presence of chains in real 
networks and the respective random counterparts). The 
specific origin of handles and tails are likely related to the 
evolution of each type of network, or incompleteness aris- 
ing from sampling. In the first case, the handles and tails 
in geographical networks may be a consequence mainly of 
the chaining effect obtained by connecting vertices with 
are spatially near/adjacent one another. In the second, 
we showed that incomplete sampling of networks by ran- 
dom walks can produce specific types of chains. 

All in all, the results obtained in our analysis indi- 
cate that handles and tails are present in several impor- 
tant real-world networks, while being largely absent in 
the randomized versions and in the considered theoreti- 
cal models. The study of such motifs is particularly im- 
portant because they can provide clues about the way in 
which each type of network was grown. Several future in- 
vestigations are possible, including the proposal of mod- 
els for generation of networks with specific distribution 
of handles and tails, as well as additional experiments 
aimed at studying the evolution of handles and tails in 
growing networks such as the WWW and the Internet. 
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FIG. 7: The distributions shown in (a), (b), and (c) correspond to the most significant data (each distribution have at least 
three points). Points correspond to the real data, and the solid lines correspond to the theoretical predictions. 






FIG. 8: Figures (a) and (b) present the number of tails and 
handles of different sizes in the Erdos-Rcnyi model, respec- 
tively. Figures (c) and (d), on the other hand, present the 
number of tails and handles for the Barabasi and Albert scale- 
free model, respectively. Each point in the mesh grid is the 
average considering 1 000 realizations of each random walk. 
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